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First- and Second-Order Coding Theorems for 
Mixed Memoryless Channels with General Mixture 

Hideki Yagi, Te Sun Han, and Ryo Nomura 


Abstract 

This paper investigates the first- and second-order maximum achievable rates of codes with/without cost 
constraints for mixed channels whose channel law is characterized by a general mixture of (at most) uncountably 
many stationary and memoryless discrete channels. These channels are referred to as mixed memoryless channels 
with general mixture and include the class of mixed memoryless channels of finitely or countably memoryless 
channels as a special case. For mixed memoryless channels with general mixture, the first-order coding theorem 
which gives a formula for the e-capacity is established, and then a direct part of the second-order coding theorem 
is provided. A subclass of mixed memoryless channels whose component channels can be ordered according to 
their capacity is introduced, and the first- and second-order coding theorems are established. It is shown that the 
established formulas reduce to several known formulas for restricted scenarios. 


I. Introduction 

Investigation of the maximum aehievable rate of eodes whose probability of decoding error does not 
exceed c G [0,1) for various coding systems has been one of major research topics in information theory. 
The first-order optimum rate for channel codes with such a property is referred to as the e-capacity. 
Inspired by the recent results of second-order coding theorems given, for example, by Hayashi @ and 
Polyanskiy, Poor, and Verdii ifTTll for stationary memoryless channels, this research topic has become of 
greater importance from both theoretical and practical viewpoints. 

It is well-known that stationary memoryless channels with finite input and/or output alphabets have the 
so-called strong converse property, and the e-capacity coincides with the channel capacity (e-capacity with 
e = 0) [[T9l . On the other hand, allowing a decoding error probability up to e, the maximum achievable 
rate may be improved for non-stationary and/or non-ergodic channels. The simplest example is a class of 
mixed channels O, also referred to as averaged channels [HI, lEl or decomposable channels IfT^ . whose 
probability distribution is characterized by a mixture of multiple stationary memory less channels. This 
channel is stationary but non-ergodic and is of theoretical importance when extensions of coding theorems 
for ergodic channels are addressed. 

For general channels including mixed channels, a general formula for the e-capacity has been given 
by Verdii and Han [14]. This formula, however, involves limit operations with respect to code length n, 
and thus is infeasible to compute in general. On the other hand, for mixed channels of uncountably many 
stationary and memoryless discrete channels, which will be called general mixed memoryless channels, a 
single-letter characterization of the channel capacity has been given by Ahlswede [HI for the case without 
cost constraints and by Han |[5l for the case with cost constraints. These characterizations are of importance 
because the channel capacity may be computed with complexity independent of n. Recently, Yagi and 
Nomura [|20l has provided a single-letter characterization of the e-capacity with/without cost constraints 
for mixed channels of at most countably many stationary memoryless channels. Regarding the e-capacity 
for mixed memoryless channels with general mixture, however, no characterizations have been given in 
the literature. The regular decomposable channel which consists of memory less channels [HSl, is one of 
a few examples for which a single-letter characterization of the e-capacity is known. In addition, the 
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second-order optimum rate has been characterized only for a few classes of mixed memoryless channels 
such as the mixed channel of two memory less additive channels ifT^ . the mixed channel of finitely many 
stationary and memory less discrete channels which can be ordered according to their capacities lf2T]l . and 
block fading channels characterized as the mixed channel consisting of additive Gaussian noise channels 

II221. 

This paper first gives a single-letter characterization of the e-capacity with/without cost constraints for 
mixed memoryless channels with general mixture (Theorem [T]). The established formula reduces to the 
one for the channel capacity given by |[T1 and (Si when e is zero. The achievability and converse proofs 
of Theorem [T] proceed in a parallel manner: (i) the upper or lower bound on the error probability is 
characterized by the type (empirical distribution) of codewords and (ii) the convergence of a subsequence 
of types to a certain probability distribution is discussed. Next, a direct coding theorem (achievability) is 
given for the second-order optimum rate (Theorem [2l). In the proof of Theorem [2l an upper bound on 
the error probability is derived based on the random coding argument of a fixed type, and it is a key to 
specify the type of codewords so that the speed of the convergence of the mutual information computed 
by this type to the target first-order coding rate is fast enough (cf. Equation (l98]) l. For a fixed code, 
on the other hand, we cannot guarantee that the speed of the convergence of such mutual information 
to the target first-order coding rate is fast enough, and this fact has prevented us from establishing the 
converse part of the second-order coding theorem. In order to circumvent this problem, we will introduce 
a subclass of mixed memoryless channels with general mixture, called well-ordered mixed memoryless 
channels, whose component channels can be ordered as discussed in ll^ . For this channel class, the first- 
and second-order coding theorems are established. It is shown that the established formulas reduce to 
several known formulas for restricted scenarios. All coding theorems are proved based on the information 
spectrum methods (c.f. (Sll, IfTTlI l. In particular, we use a proof technique for the converse part such that 
the proof proceeds based on an arbitrarily chosen converging subsequence of types of codewords, which 
may simplify even the proof of the second-order coding theorem for stationary memoryless channels such 
as in H. 

This paper is organized as follows: The problem addressed in this paper is stated in Sect. [Ill We next 
establish the first-order coding theorem in Sect. IIII-Al and a direct part of the second-order coding theorem 
in Sect. IIII-RI for mixed memoryless channels with general mixture. These theorems are proved in Sect. 
ITVl several lemmas used to prove the theorems are first provided in Sect. IIV-A[ and then proofs of the 
coding theorems are given in Sect. IIV-BI and llV-Ci respectively. Section 0 discusses well-ordered mixed 
memoryless channels, introduced in Sect. IV-Al and the first- and second-order coding theorems are stated 
in Sect. IV-BI along with the proofs in Sect. IV-CI and IV-D[ Some concluding remarks are given in Sect. 
lYU 

II. Problem Formulation 
A. Mixed Memoryless Channel under General Mixture 

Consider a channel —)■ without any assumption on the memory structure, which 

stochastically maps an input sequence x G A"” of length n into an output sequence y E of length 
n. Here, X and y denote input and output alphabets, respectively. A sequence W := of 

channels W"' is referred to as aseneral channel m- 

We consider a mixed channe\j with a general probability measure (Si Sect 3.3]. Fet 0 be an arbitrary 
probability space and assign a general channel Wg = to each 6^ G 0, which are called component 

channels or simply components. Here, we assume that each We has the same input alphabet X and output 
alphabet y. With an arbitrary probability measure w on 0, we define a mixed channel W = 


'Mixed channels are also referred to as averaged channels lH or decomposable channels CD. 
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with the conditional probability distribution given by 


W^{y\x) = / W^{y\x)dw{e) 

Je 

(Vn = - ;Va; e G 3^"). 


( 1 ) 


In this paper, we focus on the case where the component channels are stationary memoryless discrete 
channels. Then, a component channel can be denoted simply hy We = {We : —)■ 3^}. A mixed channel 

given by ([T]) with stationary memory less discrete channels We = {We} is referred to as a general mixed 
memoryless channel for simplicity. 

Let Cn be a code of length n and the number of codewords |C„| = M„. We denote the codeword 
corresponding to message i G {1, 2, ..., by Ui, i.e., Cn = {wi, U 2 ,..., Um^}- We assume that the 
decoding region Di of Ui satisfies 

1JA = 3^” and AnZ}, = 0 (2) 

i=l 


The average probability of decoding error over W is defined as 


-| 

Mn^ ^ * 


\Ui 


(3) 


where denotes the complement set of Di in 3^”. Such a code Cn is referred to as an (n, M„, Sn) code. 
We consider a cost function c„(-) for a: = {xi, X 2 , ■ ■ ■, Xn) G defined as 


Cn(x := 


n 




2 = 1 


where c : A” —)■ [0, cxo). A sequence x is said to satisfy cost constraint T if 


(4) 


Cn{x) < r, 


(5) 


and an {n,Mn,en) code is said to satisfy cost constraint T if every codeword it* G Cn satisfies cost 
constraint T. 

Remark i.- If T > c{x), then (|5]) holds for any x G A”. This case corresponds to the coding 

system without cost constraints, which is indicated simply by T = +cx). □ 


B. Optimum Coding Rates 

Definition 1: A first-order coding rate i? > 0 is said to be {s\T)-achievable if there exists a sequence 
of {n,Mn,en) codes satisfying cost constraint T such that 

lim sup Sn < £ and lim inf — log Mn > R- (6) 

n^oo n—)-oo fl 

The supremum of all (e:|r) -achievable rates is called the first-order {£\T)-capacity and is denoted by 
C£(r). We also write as Ce = Ce(-l-oo) for simplicity. □ 

Set To := mm.x&xc{x). If T < Tq, then obviously Ce(r) = 0 because no sequences x G A" satisfy 
cost constraint T, and hence no i? > 0 is (£|r)-achievable. 

Let M* ^ denote the maximum size of codes of length n and error probability less than or equal to £ 
satisfying cost constraint T. The first-order (£|r)-capacity indicates that M*^ behaves as 

logM;. = nC,(r) + t>(n) 
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for sufficiently large n. For eoding systems whose first-order eapaeity had been eharaeterized, our next 
target may be to eharaeterize the seeond-oder term of logM* ^. This motivates us to introduee the second- 
order coding rates, and its maximum value denoted by Ds(R\T) with respeet to the first-order eoding 
rate R = CsiT) roughly satisfies the relation 

logM;, ~ nC,iT) + V^D,{R\T) + o 

for suffieiently large n. Seeond-order aehievable rates and their optimum value are now formally defined 
as follows. 

Definition 2: A seeond-order eoding rate S is said to be {e, R\T)-achievable if there exists a sequenee 
of {n,Mn,Sn) eodes satisfying eost eonstraint V sueh that 


lim sup En <s and 


lim inf 

n^OQ 


1 ^ Q 

log -^> S. 
n e " 


(7) 


The supremum of all {e, i?|r) -aehievable rates is ealled the seeond-order (e, R\T)-capacity and is denoted 
by De(i?|r). We also write as D^i^R) = Di;{R\ -f oo) for simplieity. □ 

Remark 2: It is easily verified that if i? < ^^(r) then D^{R\T) = -|-cxo for all e G [0,1) from the 
definition of eapaeities. Also, if i? > C'e(r) then Zle(i?|r) = —oo for all e G [0,1). Therefore, only the 
ease R = Ce{T) is of our main interest. □ 


III. Coding Theorems eor General Mixed Memoryless Channel 
A. First-Order Coding Theorems 

The following theorem gives a single-letter eharaeterization for the first-order (£|r)-eapaeity of mixed 
memory less ehannels with general mixture. 

Theorem 1: Let W he a general mixed memory less ehannel with measure w. For any fixed £ G [0,1) 
and r > Fq, the first-order (ejF)-eapaeity is given by 


C£(r) = sup sup 

P:Ec(x:p)<r 


R 


'{e\IiP,We)<R} 


dw{9) < E 


( 8 ) 


where Xp indieates the input random variable subjeet to distribution P on X, and I{P, We) denotes the 
mutual information with input P and ehannel We : X ^ y (ef. Csiszar and Korner Q). □ 

The proof of this theorem is given in Seet. |IVl 

Remark 3: If 0 is a singleton. Theorem [T] reduees to the well-known formula 


Ce(r) = sup I{P, W) (0 < < 1), 

P:Ec(Xp)<r 


(9) 


whieh means that the strong eonverse holds in this ease (of. [l3]|, lfT9ll l. unlike in the general ease |0| > 1. 
For 0 whieh is a finite or oountable infinite set, formula ([8]) of the first-order eapaeity C£(r) reduees to 
the formula given by Yagi and Nomura lf20ll . For mixed memory less ehannels with general mixture, on 
the other hand, in the speoial ease of e = 0, formula ([8]) reduees to 


Co(r) = sup te-ess.inf/(P, kFe), 

P:Ec(Xp)<r 


( 10 ) 


whieh eoineides with the formula given by Han |l5l Theorem 3.6.5], where te-ess.inf denotes the essential 
infimum of J(P, We) with respeet to the probability measure w. □ 

When 0 is a singleton, it is known that the Ce(r) is oonoave in F and is striotly inoreasing over 
the range Fq < F < F*, where F* denotes the smallest F at whieh C£(r) eoineides with (without 
eost eonstraints) (of. Blahut [|2l). For the ease of |0| > 1, Ce(r) is indeed non-deoreasing, but there are 
examples of mixed memory less ehannels for whieh C£(r) is not striotly inoreasing in Fq < F < F*. This 
also indieates that C£(r) need not be oonoave in F. 
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In the case without cost constraints, Theorem [T] reduces to the following corollary. 

Corollary 1: Let VL be a general mixed memoryless channel with measure w. For any fixed e G [0,1), 
the first-order e-capacity is given by 

Ce = supsupli? / dw{ 6 ) < e\, (11) 

P I J{e\I(P,We)<R} ) 

where sup denotes the supremum over the set V{X) of all probability distributions on X. □ 

p 

Remark 4: The direct part of formula (fTTI) was first demonstrated by Han |I51 Lemma 3.3.3]. In the 
special case of e = 0, we have an alternative formula of Cq as in (fT^ (by replacing the supremum over 
{P I Ec{Xp) < r} with the supremum over P(X)), which coincides with the formula given by Ahlswede 
m. See also m Remark 3.3.3] for the equivalence between these characterizations. □ 


B. Second-Order Coding Theorems 

We now turn to analyzing second-order coding rates. Let denote the Gaussian cumulative 

distribution function with zero mean and variance 


P{x)WB{y\x) flog 

x&X y&y \ 


WB{y\x) 

PWe{y) 


D{We{-\x)\\PWe] 


that is, 


'^e,p{z) G — , G{z) 


•JVt 


e,p 


J-c 


e 2 dt, 


( 12 ) 


(13) 


where 


PWg{y) :=Y,Pi^We{y\x) 

X 


(14) 


denotes the output distribution on y due to the input distribution P on X via channel Wy, and 
D{We{-\x)\\PWe) denotes the divergence between W 0 {-\x) and PWe. It is known that there are stationary 
memoryless channels Wg for which Vg^p = 0 for some P G P{X) (cf. ifTTI . lfT4]l i. In such a case, with 
an abuse of notation, we interpret '^g^p{z) = G{zl^Vg^p) as the step function which is defined to take 
zero for z < 0 and one otherwise. 

For the second-order coding rate, we have the following direct theorem (achievability). 

Theorem 2 (Direct Part): Let W he a general mixed memoryless channel with measure w. For e G [0,1), 
r > Fo, and i? > 0, it holds that 


where 




.(RIT) 


> sup sup 

p-.Ec{Xp)<r 




(15) 


G^{R,S\P)-.= [ dw{9)+ [ ^gXS)dw{e). (16) 

J{e\IiP,Wg)<R} J{e\I(P,We)=R} 

□ 

The proof of this theorem is given in Sect. UV] 

Remark 5: The two terms on the right-hand side of (fT^ can be summarized into the following single 
term: 


'0 


dw{9) lim ^g^p{^/^(R-I(P,Wg)) + S), 

n^oo ^ ' 


(17) 
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which is called the canonical representation (cf. Nomura and Han [|9l|, ifTOll l. Let us here focus on the 
crucial case of i? = (^^(r). In view of formula ([8]) for the e-capacity C£(r) it is not difficult to check 
that, for any P such that Ec(Xp) < T, 


dw{9) < e, 


J {e\I(P,Wg)<Ce{T)} 

f dw{9) > e 

J {e\I{P,We)<Ce{T)} 

hold. Thus, we may consider the following canonical equation for S: 


'e 


dw{9) lim ^ 0 ,p(v^(C',(r) - J(P, We)) + S) = e. 


(18) 

(19) 

( 20 ) 


Notice here, in view of (fTSl) and (fT9l) . that equation (l20l) always has a solution. Let 5'p(e) denote the 
solution of this equation, where Sp{e) = +cxo if the solution is not unique (notice that this case occurs 
I{e\i{PWg)=Ceir)} dw{9) = 0 , which equivalently means that the second term on the right-hand side in 
CH) is zero). Then, the P£(Ce(r)|r) (i.e., R = Ce(r)) in (fTSl) can be rewritten in a simpler form as 

:D,(c',(r)|r) = sup ^p(£). (21) 

P-.Ec{Xp)<T 

We sometimes prefer this simple expression rather than in (fTSl) . □ 

Remark 6: Denote the right-hand side of (fT5l) again by ^^(Plr). If 0 is a singleton, it can be easily 
verified that 

" - if p > ^.(r) 

{^|^p(^)<£} iiR = C,{T) 


-oo 


^e(P|r) = 


sup sup 

P-.I{P,W)=R 
Ec(Xp)<r 

-l-cxo 


( 22 ) 


ifP<C,(r), 

where setting the singleton set 0 as 0 = { 6 'o} we use fl'p instead of ^p,eo- In particular, if 

R = C,{V)= sup I{P,W), 

P:I{P,W)=R 

Ec(Xp)<r 

then it follows from Theorem |4] with |0| = 1 later in Sect. IVl that 

p(c,(r)|r) = p(c,(r)|r) = { ^ 


where 


Kiax := max Vp, 

PJ{P,W)=Ce{T) 

Ec(Xp)<r 

Klin := min Vp 

PJ{P,W)=Ce{T) 

Ec(Xp)<r 


(23) 

(24) 

(25) 

(26) 


by using Vp instead of Vp,eo- Formula (l24l) is due to Hayashi @ (with cost constraint), Polyanskiy, Poor, 
and Verdu IfTTl (without cost constraints), and Strassen |[T4ll (without cost constraints under the maximum 
error probability criterion). □ 

Similarly to the first-order coding theorem. Theorem |2] reduces to the following corollary in the case 
where there are no cost constraints. 

Corollary 2: Let VF be a general mixed memoryless channel with measure w. For e G [0,1) and P > 0, 
it holds that 


Ds{R) > sup sup js* Gu,{R, 5'|P) < . 


(27) 

□ 
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IV. Proofs OF Theorems [I] AND [2] 

A. Lemmas 

We state several lemmas whieh are used to prove Theorems [T] and |2l We first provide error bounds for 
eodes of fixed length, whieh hold for any general ehannel. 

Lemma 1 (Feinstein’s Upper Bound For any input variable with values in there exists an 
(n, code such that 

f 1 1 1 

^ I«^ "I ^ 

wherell V" is the output variable due to via channel W'^ and r; > 0 is an arbitrary positive number. □ 
The following lemma was first established in [0 Lemma 4] in the context of quantum channel coding. 
The proof for the classical version is stated in (bl Sect. IX- bH. 

Lemma 2 (Hayashi-Nagaoka’s Lower Bound /0/j-' Let be an arbitrary probability distribution on 
y^. Every (n, M„, code satisfies 

f 1 1 1 

^ ^ n- "} - 

where X'^ denotes the random variable subject to the uniform distribution on Cn, V" denotes the output 
variable due to X” via channel W^, and ?7 > 0 is an arbitrary positive number. □ 

We next state lemmas for mixed channels. We first arrange a so-called expurgated parameter space 
which possesses a useful property and is still asymptotically dominant over the whole parameter space. 
Given a set of arbitrary i.i.d. product probability distributions on y"', let be given as 

Q^{y) ■■= f Qo{y)dw{e) (Vy g 3 ^"), (30) 

Je 

and define 

(Vy€j>”) (31) 

and 

Q{x,y) ■= \e eQ\W]^{y\x) <e^W^{y\x)^ (V(aj, y) G x y"). (32) 

Let Ski k = 1, 2, • • • , Nni denote a type (empirical distribution) on y^, where is the number of all 
distinct types. Let /c = 1, 2, • • • , iV„, denote a joint type on X^ x y^, where Nn is the number of all 
distinct joint types. Since is an i.i.d. product probability distribution, the subset 0(j/) depends only on 
the type Sk of y, and therefore it can be denoted as Q{Sk) instead of Q{y). Likewise, since WQ{y\x) is 
stationary and memoryless, the subset Q{x, y) depends only on the joint type Sk of {x^ y), and therefore 
it can be denoted as Q{Sk) instead of Q{x,y). Using 

Qn := n 0(^Ai) and 0„ ;= f| 0(^fc), (33) 

k=l k=l 

we define another set 

0 : := 0„ n 0„. (34) 

^For random variables U and V, we let Pjj denote the probability distribution of U and Pu\v denote the conditional probability distribution 
of U given V. 

'^Later, we shall generalize this lemma to the mixed channel consisting of general component channels in Lemma [T] whose poof is given 
in Appendix iDl 
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Lemma 3: Let VL be a general mixed memoryless ehannel with measure w. Given a set of arbitrary 
i.i.d. produet probability distributions Qg on 3^”, let be defined by (l30l) . Then, it holds that 


(Proof) See Appendix 



dw{e) > + 


(35) 

□ 


The following lemmas play a key role in proving the eoding theorems for mixed ehannels. 

Lemma 4 (Upper Decomposition Lemma): Let W he a general mixed memory less ehannel with measure 
w. Then, it holds that 


Pr 


1 ^ W'^(Y^\X^) 

n Py.(V) 



< Pr 



Py.~W) 


7 

< H- 1=^ + 

Vn 



+ e-^^ 


(V^e©:), (36) 


where 7 > 0 and Zn > 0 are arbitrary numbers, and Y^ indieates the output variable due to the input X^ 
via ehannel Wg. 

(Proof) See Appendix |Bl □ 


Lemma 5 (Lower Decomposition Lemma): Let VL be a general mixed memory less ehannel with measure 
w. Given a set of arbitrary i.i.d. produet probability distributions Qg on y^, let be defined by (l30l) . 
Then, it holds that 


Pr 


1 ^ W^iYg^\X^) 

n Q^(Yg^) 



> Pr 



W^(Y^\X'^) 

~Qs(wr~ 


< z, 






(V0 e ©:), (37) 


where 7 > 0 and Zn > 0 are arbitrary numbers, and Y^ indieates the output variable due to the input X” 
via ehannel Wg. 

(Proof) See Appendix O □ 

Remark 7: As we shall show in the proof of Theorem [T] in the next subseetion, there exists an 
interesting duality between the aehievability proof and the eonverse proof based on Lemmas |4] and [51 
Using Upper/Lower Deeomposition Lemma has been the standard teehnique in the analysis of the optimum 
eoding rate in various problems in information theory sueh as souree eoding (Si Seet. 1.4], [fTOl . random 
number generation [|9l, and hypothesis testing [151 Seet. 4.2] for mixed sourees. The proof of Theorem 
HI demonstrates that we may also use this standard teehnique for mixed memory less ehannels. Later, we 
shall also demonstrate in Seet. IV-DI that Lemma [7] ean be used as a powerful alternative to Lemmas [2] 
and [51 and it saves several steps of the eonverse proof. □ 


B. Proof of Theorem [7] 

(Proof of Direet Part) 
Define 


G£(P) := sup sup <1 R 

P:Ec(Xp)<r 


dw{9) < , 


(38) 


' {e\I{P,Wg)<R} 

and then for any small <5 > 0 there exists an input distribution Pq G V(X) sueh that Ec(Xp(,) < P and 


sup < R 


! {e\I(Po,We)<R} 


dw{e) <e\> ^.(P) - 5. 


(39) 


We fix sueh a Pn and show that 


P = a(P) - 45. 


(40) 
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is (e I r) -achievable. 

Without loss of generality, we assume that the elements in X = {1,2,... ,\X\} are indexed so that 
c(l) > c(2) > ■ ■ • > c(|A’|). We define the type Pn on X"- so that 


n 

\x\-i 


\x\-i), 


X = 1 


Then, it is readily shown that 


Pn{x)c{x) < r, 


x&X 


XI 


IP^(x) - Po(x)l < ^ (VxeX), 


n 


and 


lim Pn(x) = Po(x) (Vx E X), 


(41) 

(42) 

(43) 

(44) 

(45) 


where (l43l) follows because Pq satisfies J2xex ^o(x)c(x) < T. 

Let Tn be the set of all sequences x E X^ of type Pn, and consider the input random variable 
uniformly distributed on Tn. Using Lemma[T]with 2 log = R and r] = where 7 > 0 is an arbitrary 
positive number, we obtain the following chain of expansions 

r 1 7 

lim sup Sn < lim sup Pr < — log —-—-— < R -\—^ 

n—>00 n—>00 (U Py^iY^j 


= lim sup / dw{9) Pr < — log 


n^oo J © 

= lim sup 

n^oo L-'©; 


n 


W^{Y^\X^ 

Py^Y,-) 


<R + 


7 


dw(e) Pri log <R + ^ 

\n Pv^iYe) Vn 


dw{9) Pr < — log 


/©-©* 


W^{Y^\X^ 


< lim sup / dw{9) Pr < — log 


n—)-00 J G * 

^71 

+ lim sup / 

n^oo j0-0; 


n ^ Pyn {Yg 

p^n(yn|^n) 


<R + 


7 


n 


Pr~(yj 


<R+X 

n 


dw{9) Pr <; - log <R+ ^ 


n 


PvXi 


n\ — 


< lim sup / dw{9) Pr < — log 


n—^oo J G* 


n 


W^{Yg^\X^ 

py-m 


<R + 


7 


+ lim sup 


dw{9) 


n^Qo je-e* 


1 W'^(Y'^\X‘ 

= lim sup / (it(;( 6 ') Pr <! — log ^ 


n^oo J G* 


7 


n 


X <R + ^ 
n\ 


Here, we have used 


Py^Yg 
dw{9) < 2(n + l)l'^l-l^le-^ 


(46) 


'0-0* 


(47) 
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(cf. Lemma [3]) to obtain (|4^ . We apply Lemma |4] with Zn = R + to (061) to obtain 

f f 1 2'y 1 

lim sup En < limsup / dw{6) Pr < — log —^— < R + 


n—^oo J R* 


n 


PvAYe 


n ^ 


< hmsup / dw{e) Pr - log . . _ 

n^cxD Jq U r^Y^ylQ ) 


27 1 
<R+^ + 


< I dw{e) lim sup Pr <; - log < R 


>e 


n 


PY,-(y, 


where the inequality in (14^ is due to Patou’s lemma. Now notiee that 

PYg^y) = -^\Yl 


xeTn 


< 




X) 


xeTn 


n 


27 1 

+ 


n \ft^ 


(48) 


= [n 


\Xi 


1)'^' E X{Pn{x^We{yi 

x&Tn i=l 

= {n + iy^\{PMr^{y) i'^yey^), 

where (P^lLe)^" denotes the n product distribution of 

PnWeiy) := Pnix)We{y\x) (Wy G 3^). 
xex 


'e 


(49) 

(50) 


Plugging inequality (1491) into (1481) . we obtain 

f f 1 W^(Y^\X^) 2'y 1 \X 

lim sup En < I (jw(6() limsupPr — log ^ R^ —^°s(’^ + 1) 


n " (P„hP,)x-(V) 


n 




n 


f r 1 w^iY^ix^) 

<Jjn,(9) lim sup Pr | - log <R^S 

Inequality (ISTl) implies that there exists Xn G rY" of type P„ such that 
limsup£„< / dw{e) lim sup Pr - log 

n^oo J© n^oo I, \XnrY0) q ) 

Now, we can write as 


<R + 6 


X^ = Xr. 


{RnWeY^Y^) ^P„1L,(F,,)’ 


-log 


(51) 


(52) 


(53) 


where 


{Xli X2i ' ' ' 1 Xn)j 

Y,^ = {Y0^^,Y0,2r-- ,ye,n)- 

Notice here that Fe,!, Fg 2 ,..., Fe „ are conditionally independent random variables given X” = Xn (under 
the conditional distribution lF^(-|a:;„)), and therefore the right-hand side of (15^ is a sum of conditionally 
independent random variables given X” = Xn with conditional mean 


E < - log 


WeiXe^ilxi 


i=l 


PnWg{Yg,i 


X^ = Xn} =I{Pn,Wg) 


(54) 

















1! 


and conditional variance 


V 



log 


We{Ye,^x,) 

PnWB{Ye,i) 




x&X y&y 



Wg{y\x) 

PnWe{y) 


D{Wg{-\x)\\PnWe) 


2 




e,Pn 


(55) 


Then, we ean invoke the weak law of large numbers to the probability term Pr{-} in (15^ . To do so, 
we split the parameter spaee 0 as follows: 


Q,-.= {e eQ\I{Po,We) <R + 5}, 
Q^,= {e^Q\I{Po,We) = R + 5}, 
Q^-={e eQ\I{Po,We)> R + 5]. 


It is easily verified that 
lim sup Pr 


1 log. 


<R + 5 


= xA = 


1 , if 0 e 01 
0 , if 0 e 03 


n ^{PnWgY^{Y^) 

by virtue of the weak law of large numbers and (1431) . where we should notice that the inequality 

max Vg^p < +CX 0 (V 6 * G 0) 


(56) 

(57) 

(58) 

(59) 

(60) 


holds due to Han (Si Remark 3.1.1] and Polyanskiy et al. IfTTl Lemma 62]. Then, (15^ is rewritten as 


lim sup < 

n—)-oo 


< 


dw{9) = 


' ©lU ©2 


'{0|7(Po,^e)<Ce(r)-35} 

dw{6) < e, 


dw{9) 


(61) 


J {e\I{Po,Wg) <Ce{T)-2S} 

where the last inequality follows from (l39l) . Henee, i? = (Pe(P) — 45 is (£|P)-aohievable. 

(Proof of Converse Part) 

Assume that R is (£|P)-achievable. By the definition of (ejP)-achievable rates, there exists an {n,Mn,e. 
eode Cn with cost constraint P such that, for an arbitrary 5 > 0, 


□ 


— log Mn > R — 5 (Vn > rio). 


n 


By Lemma [2] with rj = any (? 7 ,,M„,e„) eode satisfies 


Yn 

f 1 W^iY^lX'^) 1 7 

> Pr -log—< -logM„ - ^ 




(62) 


(63) 


where X" denotes the random variable subjeet to the uniform distribution on the code and 7 > 0 is an 
arbitrary number. The output distribution in (1631) is set as follows: Letting be an output distribution 
on indexed by 6 ^ G 0 sueh as 

1 


Q2(s/) -jpY, e e,Vy e y 


P-n&Tr, 


(64) 
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where Tn denotes the set of all types on A”” of size iV„ := \Tn\, and {PnWe] 
distribution of PnWe- Using this {Qgjeee, we define as 

Q^{y) ■■= [ Qeiy)dw{e) (Vy g 3 ^"), 

Je 

where we notice that Q^{y) depends only on the type of y, and so does Q^{y). 
Since R is (£:|r)-achievable, the following expansion follows from (l62l) and (1^ : 


denotes the n product 
(65) 


£ > lim sup Pr 


1 


<R-S 


7 


1 W'’^(Y^\X 

> lim sup / Pr — log ^ 


71^00 J@ 


n 


0"(U 


n\ — 


<R-26 


> lim sup 

n—)-oo , 


0 ; 




Applying Lemma [5] with Zn = R — 26 to 

1 


£ > lim sup 

n—^oo 

> lim sup 


dw{6) Pr 


yields 

W^{Yg^\X^' 


QeiYe 


'ei 


' ' ' n Q;(y„”) 


<R-26 

<R-36 


<R-25 


7 


( 66 ) 


n 


4/ o 

yri'^ 


= lim sup 

n—^oo \_J © 


dw{9) Pr 


1 Wl\Y,-\X 


n 


<2?(U 


n\ — 


<R-36 


'0-0* 


dw(S) Pr ■{ - log < R-U 


> lim sup / dw{9)Fr 

n^oo J 

— lim sup 

n^oQ 

> lim sup / dw{9)PT 


n^oo Je-e* 


j. HWllW 

n Q2(\7) 


<R-36 


dw(B) Pr .{ - log <R-35 


rr Q-SiY,") 


,■ I , r, I 1 , 'U”(U”U'” 

= lim sup / (ite(0) Pr — log ^ ® 


'0 


Here, we have used (l47l) to obtain 


n - Q-(r- 

. Notice that 


<R-36 

<R-36 


lim sup 

n^oo 


dw{9) 


'0-0* 


(67) 


,.(,)Pr^i,og]™p<fl-3^ 
' ' ' n gs(U”) - 


E 


1 

M 


(ite(0) Pr 


n JQ 


n ® g-(r-) 


<R-36 


X^ = x 


( 68 ) 


and therefore there exists a codeword ai„ G C„ such that 


(ite(0) Pr 


'0 


1 iv;(U"|x 
" QSIU”) 


n 

n\ 


> / (it(;(6') Pr 


'0 


n g-(F-) 


< R-36 

<R-36 


X^ = Xr. 


(Vn > no). 


(69) 
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Let Pn denote the type of sueh Xn- By (1641) . the right-hand side of (1691) ean be lower bounded as 


'0 




d.„(9)Pr^-log 


> / dw{9) Pr < — log 


>e 


<R-35 = Xr 


<R-36 


logA^n 


n 


X^ = Xr. 


> I dw;(0)Pr<i-log. 


>e 


n ^ (PnfP 0 )x-( 17 ) 


< P-4(5 


X^=Xr, 


(Vn > no), 


(70) 


where we have used the inequality < {n+ l)l'^l to obtain (TTOl) . Combining (1671) . (1691) . and (TTOl) yields 


e > limsup / dw{ 6 ) Pr < — log 


n—)-oo j© 


w^{Y,-\xr:) 


<R-A5 


X^ = Xr 


(71) 


n ^{PnWeY^iY^) 

Sinee {Pn}n>no is a sequenee in V{X) (compaet set), it always eontains a eonverging subsequenee 
{P„^, P„ 2 , • • • }, where ni < n 2 < • • • <—)■ cxo. We denote the eonvergent point by Pq; 


lim P„, = Pr 


(72) 


where it should be notieed that Pq satisfies eost eonstraint: Ec(Xp(,) < P beeause Pn satisfies the same 
eost eonstraint P. For notational simplieity, we relabel Uk as m = ni, n 2 , • • •. Then, in view of 

W^{Yfl^\Xr. 


limsup / dw{ 6 ) Pr < — log 


n—^oo JQ 


n " (P„iy,)x-(y-) 


<R-46 


X^ = Xr 


> limsup / du;(0)Pr<'-log- 


m^oo j© 


m {PmW0)^'^{Yg^) 


< P-45 


X'” = a;. 


(73) 


(ITTl) becomes 


s > limsup / dw{ 6 ) Pr < — log 


m^oo j© 


W^{Y,^\Xm) 
m {PmWeY^iY^) 


< P-45 


X^ = Xr 


1 W'^(Y'^\x 

> I (ite(6*) liminf Pr <1—log ^ ® 


> [ dw{9) lim inf Pr I — log 

701 I m 


m {PmW 0 )^^{Yf^) 
W^{Yg^\Xm) 


< P-45 


X™ = Xr 


{PmWe)x^{Yg^) 


< P-45 


X"* = a;. 


where the inequality in (1741) is due to Fatou’s lemma, and ©i is defined as 

©1 ;= {9 e ©|/(Po, 1 X 0 ) < P - 45}. 

Set Xm = {xi,X 2 ,-" )^m), and then 


^ ^ , We(Yg^i\xi) 

m*^^(P„ilL,)x-(r-) ^ iPmWe){Ye^,) 


1 W^{Yg^\X 


(74) 

(75) 

(76) 

(77) 


is a sum of conditionally independent random variables given X"* = x^., and its expectation and variance 
under W^{-\xm) are given by 


Erliog. 


m {PmWe)^^{Yg^) 


X^ = Xm\=I{Pm,We) 


(78) 
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and 


Vlllog. 


m {PmWe)^"^{Yg^) 


= X, 


EE Pm{x)W 0 {y\x) (log 
x&X yey 


We{y\x) 

{PmW0){y) 


D[We{-\x)\\PmWe) 


respectively. Hence, the weak law of large numbers guarantees 


. < 1 . W^iY^\Xr 

lim inf Pr — log 


m-^oo [m {PmWe)^^iY^) 

Thus, (1751) is rewritten as 

£ > / dw{9) = 
7ei 


<R-A5 


X'^ = xA = l (V0 G 0i). 


dw{9). 


J {e\I{Po,Wg)<R-45} 

Therefore, from the definition of (^^(r) (cf. (13^1. we have 

R-46<C,{T). 

On the other hand, since 5 > 0 is arbitrary, we conclude that R < C'e(r). 


(79) 


(80) 


(81) 


(82) 

□ 


C. Proof of Theorem |2 
We first define 

;D,(i?|r) := sup snp{S\G^{R,S\P) <e}, (83) 

p-.Ec{Xp)<r 

where see (1T61) as for the definition of Gw{R-, <S'|P). Then, for any 5 > 0 there exists an input distribution 
Po G V{X) such that £c{Xpf) < r, where Xp^ denotes the random variable subject to Pq, and 

sup {S I G^(P, S\Po) <e]> D,{R\T) - 6. (84) 

We shall show that S = D^{R\T) — AS is (e, P|r)-achievable. 

Fix a Pq satisfying (l84l) and a constant 7 > 0 such that S > 27 . By Lemma [T] with 

- logM, = P + ^(:D,(P|r) - 45) (85) 

n yjn 

and r? = -7=, we have 
' vn’ 

r 1 1 _ 1 

^ + 7 1 + ( 86 ) 

We choose a type P„ on so as to be specified by (|43]) - (I45]) . Let X" be the uniformly distributed input 
random variable on T„, defined to be the set of all sequences x G X” of type P„. Then, we have 

PY,4y) <in + l)\^KPnWer^y) (Vt/ G 


(87) 
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by (1491) . Then, by (1861) . we obtain 

^ f 1 , W^CV^IX^) 

hm sup En < lim sup Pr - log , . 

n^oo n^oo ^ i^ynyi J 

<limsup f dw{e) Pr I^ log ^ 

n^oo Js I, n ry^y^Q ) 

.• r /■ , /mT. f 1 1 W^CV.^IX^) 
= limsup / } 1 ^.. V ei ; 

n —>00 L>/ 0 : 


<R+ ^D,iR\T)-A6 + ^) 

\ n 


<R + ^{D,{R\T)-AS + ^)^ 


dw{6) Pr < — log 
r 

dw{6) Pr < i log 


n Pv^iY^) 
1 W'^{Y^\X^) 


<R+^{D,{R\T)-A5 + ^) 

\ n 




Je-ei 

< limsup / dw{e) Pr - log ^ . < i? + ^ (^^(i^lP) -45 + 7 ) 

n—>00 ./©* I ri lynyig j yU 

+ lim sup / 
rn-oo Je-e* 

= limsup / ciw;(0)Pr -log—4^</2+^(D,(i?|P)-45 + 7)k (88) 


1 ]Y'’^(Y'^\1 _ 


n—^00 J Q* 

where the last equality is due to (l47l) . _ 

Now by ([87]) and Lemma |4] with Zn = R + ^ (^D^{R\T) —45 + 7 ), 

f f 1 W^(Y^\X^) 1 _ 1 

lim sup < limsup / dw{e)FT\ - log ^ ^ < i? + ^ (D^(i?|P) -45 + 27 ) + 

[n Fy^^Yg) 


n^oo J S* 

r ri W^(Y''^\X''^) 1 _ 

< lim sup / dwW Pr I - log ^ (B.(fl|r) - « + 27 


n—>-cx) J©* 


+ 


1 |A’|log(n + l) 


4/ T 
VU'^ 


n 


< limsup / dM;(6') Pr <^ — log 


n^oo J S* 


n " {PnWeY^iYg^) 


<-R + ^(c7fi|r)-37}, 


(89) 


Since 


^ I (w'w) -^ 


Pr{X" = x} f dw{9) Pr I — log 

r.cT’ 4©* ( ri 


£cer. 


there exists an Xn G T„ such that 


YdQ{YQ\x) ^ p J_^ (j^ (p\-p\ _ o ^'i 


limsup< limsup / dw{9) Pr < — log 


n—^oo J S* 


W]^{Y^\Xn) 
n (P„lL,)x-(r”) 


1 


<P + ^(P,(P|P)-35) 
\/n 


X" = ir| , 
(90) 

X" = a;. 


< lim sup / dw{9) Pr < — log 


n^oo J© 


(p„iy,)x-(r-) 


<P+^(P,(P|P)-35) 

\/n 


< / dw{9) limsup Pr < — log 


/© 


Wo^iYg^M 

(P„iy,)x-(y-) 


1 


<P+^(P,(P|P)-35) 

v u 


X” = ai. 


X” = ai. 


(91) 
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where the last inequality is due to Patou’s lemma. 
Now, again sinee 


l.log. 


n " (P„hP,)x-(V) 

is a sum of eonditionally independent random variables given X" = *„, by virtue of (l45l) . (l5^ - (l55]) and 
the weak law of large numbers, we have 


, ^ , 1 , W^iY^x. 

hm sup Pr < — log 




x'^ = xA = 


1, if 0 e 01 
0, if 0 e 03 


where 0j (i = 1, 2, 3) is defined as 


01 := {eeQ\I{Po,We)<R}, 
02 ;= {e^Q\I{Po,Wg)=R], 
03 := {e^Q\I{Po,Wg)>R]. 


(92) 

(93) 

(94) 

(95) 


Thus, 

limsupe^ < / dw{9) 


dw{6) lim sup Pr < — log 


'02 


u"” {RnWeY-iYeY 


<R+^{DYR\T)-36) 

\/n 


X” = Xr 


(96) 


Denoting the seeond term on the right-hand side by P, we have 


B = 


l{e\I{Po,Wg)=R} 


dw{9) lim sup Pr < — log 


WYye^\xY 

{RnWeY^yg) 


<I{Po,We) + ^{DYR\T)-35) 
\ u 


X^ = Xr^ 


dw{9) lim sup Pr < —= ( log 


W^y^^n) 


n\ ^ {PnWeY'^iY.Y 

X'^ = X 


-nI{Pn.We) 


J {e\I{Po,Wg)=R} ri^oo 

< DYR\T) -35 + MliPo, IPe) - I{Pn, Wo)) 

Now, we notiee that, owing to (l44l) . 

lim x/^(/(Po, We) - I{Pn, We)) = 0 

n—)-oo 

and 

lim Vgp^ = Vg^PQ, 

n^oo 

and therefore, for 6* G 02 with Vg^p^ > 0 the eentral limit theorem assures that 

w^ye^M 


(97) 


(98) 


(99) 


lim sup Pr ( log 

n^oo 


nI{Pn.Wg\ 


n\ "{PnWe)^Yye) 

< DYR\T) -35 + ^/^(/(Po, Wg) - I{Pn, We)) 

WYyS'\^r 


= lim sup Pr < —= I log 


Ri\ " {PnWg)><Yye") 
<-^g,pYDYRY)-25). 


X” = Xr 


-nI{Pn,Wg) <DYR\T)-35 


X^ = Xr 


(100) 
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For 0 G 02 with = 0, we interpret as the step funetion whieh takes zero for z < 0 and one 

otherwise. It is easily verified that (1 1001) also holds for sueh 6* G 02, and hence 

B< [ '^e,Po(DeiR\T)-26)dw{e). (101) 

J {e\I{Po,We)=R} 

Thus, by (l9^ . 

limsupe^ < / dw{6) + / pJDJR\r) — 26) dw{6) < e, (102) 

J {e\I{Po,Wg)<R} J {e\l{Po,Wg)=R} 

where the last inequality follows from (f84l) . implying that Ds{R\T) — A6 is (e, i?|r)-achievable. □ 

V. Coding Theorems for Well-Ordered Mixed Memoryless Channel 
A. Well-Ordered Mixed Memoryless Channel 

So far, in Sect. Illl-Rl we have established Theorem [2] on the second-order capacity for the mixed 
memoryless channel with general mixture; however, unfortunately, this theorem lacks the converse part. 
Thus, in this section, we are led to introduce a subclass of general mixed memoryless channels for which 
the second-order coding theorem is established, including both of the direct and converse parts. 

Definition 3: Let Wq = {Wg : X —)■ be a family of stationary memoryless channels. Let cg^r 

denote the capacity of component channel Wg with cost constraint T (> Tq), that is, 

cgr= max I(P,Wg), (103) 

’ P:Ec(Xp)<r 

and let Ilg r denote the set of input probability distributions P on X that achieve cg^r- It should be noted 

that He r is a bounded closed set. If Wq is closed and, for any 6^ G 0 and any P G He r, it holds that 

ce,r = I{Py^9') 9'e Q s.t. cg^r = cg^,r and 

Cg^r < HP^^9>) for 6*'G 0 s.t. cg^r < cg^,r, (104) 

then We is said to be well-ordered with cost constraint T, or simply V-well-ordered. A mixed memoryless 
channel W with T-well-ordered We is referred to as T-well-ordered mixed memoryless channel. □ 
Remark 8: For a F-well-ordered mixed memoryless channel, it is not difficult to check that 

n0,r = n^/ p if cg^r = C0',r for 9,9' G 0, (105) 

that is, two component channels with equal capacity have the same set of capacity-achieving input 
distributions. □ 

Remark 9: The assumption that We is closed is made just due to a technical reason. Even in the case 
where We is not closed, if its closure denoted by Wq (with extended parameter space 0) is F-well- 
ordered, all coding theorems we shall establish also hold for the mixed channel W with the original We- 

□ 

Example 1: For two channels Wg and Wgt, channel Wg> is said to be more capable than Wg if I{P, Wg) < 
I{P, Wg:) for all P G V{X) [|3l. If Wg: is more capable than Wg for all 9,9' eQ) such that cg < cg:, then 
We is F-well-ordered for all F > Fq, where cg denotes the capacity of Wg with no cost constraints. The 
followings are examples of such We'. 

• A family of binary symmetric channels which forms a closed set. 

• More generally, a closed set of additive noise channels for which additive noise Z ~ Wg{-\-) is a 

degraded version of additive noise Z' ~ Wg:{-\-) for nW 9,9' eQ such that cg < cg:. □ 

Example 2: In the special case of F = -|-oo (that is, without cost constraints), we may find much more 
examples of F-well-ordered We- A family of output-symmetric channels which forms a closed set is 
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F-well-ordered since the capaeity-achieving input distribution is uniform on X and unique (ef. Shannon 

uni). □ 

Set -Ee^r := {ce,r\(^ £ ©}• We show an important property of F-well-ordered mixed memoryless 
ehannels. 

Lemma 6: If IFe is elosed, then -Ee,r is bounded and elosed for all F > Fq. 

(Proof) Boundedness of E'e^r is obvious, so we shall show its closedness. Let a function / : V{X — 
3^) —)■ [0, -foo) be defined as 

/(fF) := max J(P,IL) (106) 

for a given elosed eonvex set Vc P 'P{X), where V{X — 3^) denotes the set of all ehannel matrices 
W X ^ y. Sinee /(P, IF) is eontinuous with respeet to (P, IF), the f(W) is a eontinuous funetion 
of IF. The image of a elosed set by a eontinuous funetion is also elosed. Henee, sinee Vc ■= {P G 
V{X) \ Ec{Xp) < F} is elosed and eonvex, we ean eonelude that Pe^r = /(IFe) is elosed. □ 


B. Coding Theorems 

We first provide a eharacterization of the first-order capaeity Ce(F), which is different from the one in 
Theorem [H for F-well-ordered mixed memoryless channels. This alternative characterization is of simpler 
form and is of great use to analyze the second-order eapaeity later. 

Theorem 3: Let VF be a F-well-ordered mixed memoryless ehannel with general measure w. For any 
fixed s G [0,1) and F > Fq, the first-order (ejF) -eapaeity is given by 


^^(F) = sup 



ce,r<R} 


dw{6) < 


(107) 

□ 


Remark 10: Due to the elosedness of p 0 ,r> for every e G [0,1) there exists some 6* G 0 sueh that 
Ce(F) = Cg p. This faet is shown in the the proof of the eonverse part of Theorem [3] in Seet. IV-CI □ 
Remark 11: The eharaeterization (11071) with F = +cx) is a generalization of the one given by 
Winkelbauer ifT^ in the sense that the elass of F-well-ordered mixed ehannels with F = -|-cxo is wider 
than the class of regular decomposable channels with stationary memory less components. On the other 
hand, the regular decomposability allows component ehannels to be stationary and ergodie, which means 
that the eharaeterization (11071) with F = +oo is a partieularization of the one given in ifT^ . □ 

Now, we turn to diseussing the seeond-order capaeity of F-well-ordered mixed memoryless channels. 
In eontrast to mixed memoryless ehannels with general mixture, for whieh only the direet part of the 
seeond-order eoding theorem (Theorem |2l) has been given, F-well-orderedness allows us to establish the 
converse theorem as well. 


Theorem 4: Let FF be a F-well-ordered mixed memory less ehannel with general measure w. Then, for 
e G [0,1), F > Fq, and P > 0, it holds that 


De{R\r) = sup sup < S' / 
p-.Ec(Xp)<r I Ju 


= sup sup S 


> {e\I{P,We)<R} 

[ dw{e) 

'{e\I{P,We)<R} 


dw{9) + / e^p{S)dw{9) < e 

J {e\I{P,Wg)=R} 

f '^0^p{S)dw{9) < e 

'{e\i{P,Wg)=R} 


= sup 


sup 






dw{9) 




^e,p{S)dw{9) < e 


(108) 


where 6* G 0 gives the (£|F)-eapaeity, that is (F£(F) = Cgp. 


□ 
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Remark 12: Formula (11081) has been established for the case of |0| < +oo by Yagi and Nomura ll^ . 
When the component channels are output-symmetric and V = -|-oo, the first supremum (with respect to 
P) on the right-hand side of (11081) is attained by only the uniform inputs, which may facilitate the proof 
of the coding theorem. □ 

Remark 13: It is not difficult to check from formula (11071) that 


/ dw{9) < e, 

J{e\cg^r<Ce{T)} 

f dw{9) > e 

J{e\ce,v<Ce{T)} 

hold, and like in Remark [5] here also we may consider the following canonical equation for S: 


'0 


dw{9) lim '^0 pi^/n{Ce{T) - c^ r) + S') = £. 


(109) 

( 110 ) 

( 111 ) 


Notice here, in view of (11091) and (IllOL that equation (11111) always has a solution. Let Sp{e) denote the 
solution of this equation, where Sp{e) = -|-oo if the solution is not unique. Then, the i9£((7£(r)|r) (i.e., 
7? = (7£(r)) can be rewritten in a simpler form as 


D,(c',(r)|r) = sup Sp{e), 

which is again sometimes preferable to the expression in (11081) . 


( 112 ) 


□ 


C. Proof of Theorem |2] 

(Proof of Converse Part) 

By definition, it holds that J(P, Wq) < for all T' G 0 if P satisfies Ec(Xp) < F. Therefore, by ([8]) 
in Theorem [H we have 


(^^(r) < sup sup < P / 
P-.Ec{Xp)<T y J{t 


dw{9) < e 


= sup < R 


W cgp<R} 


{S\ ce,r<-R} 


dw{9) < e 


(113) 

□ 


(Proof of Direct Part) 
Set 


R = sup < R 


dw{9) < e 


{®l ce,r<rt} 

for notational simplicity. Consider an increasing sequence Ri < R 2 < ■ ■ ■ ^ R such that 

f dw{9) < e (Vi = 1,2, • • •). 

{S\ cep<Ri} 

Then, we have 

f dw{9) < e 


(114) 


(115) 


(116) 



















20 


by the continuity of probability measures. Now suppose that R is not an accumulation point of i?e,r to 
show a contradiction. Then, there exists some u > 0 such that 


(/? — z/, i? T z/) n Eq y' — 0- 
This implies that {9 \ R < cg^r < i? + z/} = 0, and hence, we have 


'{^*1 C0 , t < R + i ^} 


dw{9) = 


{S\ C0,t<R} 


dw{9) < e, 


(117) 


(118) 


which contradicts the definition of R. Therefore, R is an accumulation point of Since Eq^y is a 
closed set by Lemma 0 it holds that R G 77© r, and there exists some 6* G 0 such that R = 


9T- 


Fixing P G Ilgp arbitrarily, we have 

f dw{9) 

J{9\ I{P,Wg)<R} 


'{9\I{P,We)<R, cg^r<R} 


dw{9) 


'{e\I<,P,We)<R,ce,T>R} 


dw{9) 


<{9\I(P,We)<R, ce^Y<R} 


dw{9), 


(119) 


where the last equality follows from the fact that there are no 6* G 0 such that c^ r > P = Cgp and 
I{P,We) < P for P G Ilgp by the definition of T-well-orderedness. Noticing that {9\ I{P,Wg) < 
R, cg^Y < P} = {6^1 C6i,r < P} for P G Ilg p in (11191) . we have 


l{9\l{P,Wg)<R} 


dw{9) = 




dw{9) < e, 


and formula ([8]) in Theorem [T] indicates that P < ^7,(r). 

D. Proof of Theorem |3 
(Proof of Direct Part) 

It apparently holds, with Gw{R, -SIP) as in (fT^ . that 


( 120 ) 

□ 


sup sup S 

P:Ec{Xp)<V 


G^{R, S\P) < e\ > sup sup < S' Gw{R,S\P)<e 


Pen-, 


( 121 ) 


since any P G Ilgp satisfies cost constraint: Ec(Xp) < T. Therefore, by Theorem [2l any S such that 


S< sup sup {s' Gig{R, S\P) < e 

PicU, _ 1 


( 122 ) 

□ 


is (e, P|r)-achievable. 

(Proof of Converse Part) 

Although the converse part can be established on the basis of Lemmas [2] and |5] in a manner similar to 
the converse proof of Theorem [H here instead of these lemmas, we use the following simple but powerful 
lower bound on the probability of decoding error, which is of independent interest and facilitates the proof 
of this converse part. 

Lemma 7: Let {Qg}gee be a family of arbitrarily fixed output distributions on y^. Every (n, M„,e:„) 
code Cn for the mixed channel given in ([T]) satisfies 

> I MO) Fr { i log '712^ < i log M„ - - e- 


(123) 
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with an arbitrary number rj > 0, where X” is uniformly distributed on C„. 

(Proof) See Appendix iDl □ 

Remark 14: It should be noted that Lemma |7] holds for arbitrary alphabets X, y (not necessarily finite). 

□ 

Since formula (11081) trivially holds in the cases R < C£(r) (with Di;{R\T) = +oo) and R > C£(r) 
(with Zi)e(i?|r) = —cxo), hereafter we shall prove only for the case R = C£(r), which is of our main 
interest. Assume that S is (e, i?|r)-achievable. Then, by definition, for any given 7 > 0 there exists an 
{n,Mn,Sn) code with cost constraint T such that 

- log ^ ^ (Vn > no). (124) 

n yjn 

Following a technique developed by Hayashi B, let be the output distribution on indexed by 
0 6 0 such that 




{PnWeY'^jy) {PeWeY'^jy) 
iV„ +1 Nn +1 


(V0G0,VyG3^”), 


(125) 


where Tn with Nn = \Tn\ denotes the set of all types on X” and P 0 is an arbitrary input distribution in 
r- It should be noted that the capacity-achieving output distribution PbWq for Wg is the same for all 
Pg G and this enables us to choose a particular Pg G n^r later. Using this {Qg}g(^e^ we define 
as in (1^ . Lemma |7]by replacing rj with assures that the sequence of (n, Mn, Sn) codes (satisfying 
cost constraint L) such that 


Bn > I dw{e) Pr <j - log 




n 




<R 


S-2-f 


n 


,-v/n 7 


W f dw(e) Pr ( - log 

^ 'Je l>i 




<R + 


S-2-f 


n 


X^ = Xr. 




(126) 


where Px" is the uniform distribution on C„. This implies that there exists a codeword such that 




'0 




n - 

Now, we partition the parameter space 0 as follows: 

01 ;= {0 G 0 I ce,r < P}, 
02 ;= G 0 I ce,r = P}, 
03 := G 0 I ce,r > P}- 

Using these partitioned spaces, we further bound (11271) as 


n 


X^ = Xr 


^-y/n-y 


B„. > 


dw{9)Bgj 


dw{e)Be,^ - e-^\ 


'01 


where we have set 


n I 1 1 

Bg^n := Pr <^ - log 

' n Qg(yj 


'02 




n 


X^ = Xr 


(127) 

(128) 

(129) 

(130) 

(131) 

(132) 


Let Pn & Tn denote the type of Xn (obviously, this Pn satisfies Ec(Xp„) < P, where Xp^ denotes the 
random variable subject to P„). By (11251) . the probability term Bg^n is lower bounded in two ways as 


Be,n > Pr 


1 iQg, WeiY"\xr. 


n "(P,lUe)x-(r") 


I l0g(iVn + l) ^ ^-27 


n 


n 


X"- = Xn> =: ag 


(133) 


























22 


and 


^ , 1 , Wn{Yn\Xr 

Be^n > Pr - log e \ e \ r 


^ log(iV. + l) ^ ^-27 


X" = ^ =; 


It should be noted that both Bg^n and ag^n do not depend on the choice of Pg G r in (11251) since PeWg 

^n/yn|^ \ ’ _ _ ’ 

is unique. Notice that ^ log (p Wgy^-^iY^) (P34I) (cf. (1531) 1 is a sum of conditionally independent random 
variables given X” = Xn (under Wg(-\xn)) with mean I{Pn, Wg) and variance Vg^p^, which is given as 
in (l55]) . Moreover, 


^ 1 Wg{Yg,,\x,) 

n (P,IP,)x-(y-) n ^ PeWg{Yg^i) 


l.log 


(135) 


is a sum of conditionally independent random variables given X” = x^ = (xi, X 2 ,..., a:„) (under 
WQ{-\Xn)) with mean 




i=l 


X" = a; J = ^P„(a;)P(lp-,(-|a;)||PehPe) 


(136) 




and variance 


V < - ^ log 

I n 


Wg(Yg^ijXi 


2 = 1 


PeWg{Yg, 


X- = Xr,\ = 5^5^P„(x)IPe(|/|x) (log 

xex y&y ^ 

= : Vg^Pg{Xn). 


Wg{y\x) 

PeWg{y) 


P(iy,(-|a:)||P,IP,: 


(137) 


Since {P„}^^ is a sequence in P{X), which is compact, it always contains a converging subsequence 
{P„^, P„2, • • • }, where ni< 77.2 <■••<— )-cxo. We denote the convergent point by Pq; 


lim P„, = Po, 


(138) 


where it should be noticed that Pq also satisfies cost constraint: Ec(Xpg) < P. For notational simplicity, 
we relabel as m = rii, 772 , • • •. For this subsequence, we shall evaluate 


■= / dw{ 9 )Bg^rn and / dw{ 9 )Bg^rn {m = Ui, n2, ■' ' ) 


(139) 


'01 


'02 


where (11311) is now expressed as Sm > Am + aS - 

We first evaluate Am - Fix 0 G 0i arbitrarily. In this case, I{Pm,Wg) < cg^r < R, so jSg^m on the 
right-hand side of (11341) becomes 


= Pr 


log- 


Wl^{Y^\Xr^ 


777 {PmWg)^^{Yf^) 

(777 —)■ CX)), 


-I{Pm,Wg)<R-I{Pm,Wg) + 


^ - 27 log(X™ + 1 ) 


777 


777 


= Xr 


(140) 


where the convergence is due to the weak law of large numbers. By (11341) . (11401) . and Fatou’s lemma, we 
obtain 

lim inf > lim inf [ dw{ 9 )f 3 g^m 

m^oo m—>cx5 J ' 

> [ dw{ 9 ) limM / 3 g^m 

Jsi 


= / dw{9) = 
Jei 


I C0,r<-R} 


dw{9). 


(141) 
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Next, we turn to evaluating Am . We eonsider two cases according to whether the convergent point 
Pq = lini Pm is in Ilgp or not, where p = P = ^^(r). More precisely, we will bound Am from below 

m—^oo ' ’ 

in two ways as 


= [ dw{9)Be,m > 
Je2 

(i) Consider the case of Pq ^ Ilg p. We define 


A dw{6)ae,m if Po ^ r 

Ie2 dw{0)l3e,m if Pq € r 


V, :={P|J(P,lV,)>C5^p-r} (Vr> 0 ), 


(142) 


(143) 


where cq^y = Cqt — dd = Ce(r) as we are now considering the case of 6^ G 02- Then, for each 
0 G ©2 there exists some > 0 such that Pq ^ V 2 re- This implies that Pm ^ Vrg for all m > mo 
with some positive number mo > 0. Then, by Chebyshev’s inequality, it holds that 

A.m > 1 - - - maxp Vs,p - ^ 

(S - 27 + - M^) 

where (1^ holds, indicating that jSe^m —)■ 1 (m —)■ oo). By Patou’s lemma and (11341) . we obtain 


liminfA^i > 

m—^oo 



dw{6) liminf BQ m 

m^oo ’ 


> / dw{9)\im.iYd fde r. 

( 


> / dte(6') liminf 


= / dw{9). 
J 02 


maxp Ve 


\ 


(s - 27 + j 


(ii) Next, consider the case of Pq G Ilgp. Since ce^r = Cgp for 6* G ©2 and hence Ile^r 
Remark [8]), in (11251) we can choose P^ G Ifg r for each 0 G ©2 so that 


(145) 
lle,r (^f- 


lim P^ = Po = Pe, (146) 

m—)-oo 

where we notice that Bg^n and ag^n do not depend on the choice of Pg G He r = n^p. Since again 
Pe ^ Ifg r and cg^r = R = C^{r) for 6^ G © 2 , we have 

J2Pm{x)D{Wg{-\x)\\PgWg) < Cg,r = R (147) 

by the Kuhn-Tucker theorem. Indeed, the Kuhn-Tucker theorem asserts that for finite X and y, it 
holds for all a: G A” that 


D{Wg{-\x)\\PgWg) < c,,r + Ao(c(a:) - T) (148) 

with some Aq > 0 (cf. (Sl Lemma 3.7.1]). By taking the average with Pm for both sides of (11481) . 
we obtain 

Y^Pm{x)D{Wg{-\x)\\PgWg) < c,,r + Aq( P 777 (x)c(a;) - t) , 


(149) 
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which implies the inequality in (11471) sinee Pm satisfies eost eonstraint: Ec(Xp^) < T. By (11471) . we 
have 


ag^m = Pr 


> Pr 


m 

1 


m 


log 


log 


W^{Yg"^\Xm) 

{PeWe)x-^{Yg^) 

W^{Yg^\X 


— mR < S — 2^ — 


\og{Nm + 1 ) 


m 


X™ = a;. 




< S-2^- 


log{Nm + 1) 


m 


X^ = Xm} =: fe 


(150) 


1 ) 

Sinee Vg^Pg{xm) < +oo and the third moment of ^ log (Pg^g)xm(ym) is also bounded (ef. (Si Remark 
3.1.1], [fm Lemma 62], ifTS] Lemma 7]), by the Berry-Esseen theorem and the relations in (1136!) and 
(11371) . we have 

log(Af,„ + l) 

I ^ f - 

'm — G 


'S-2-f 


Pg \S^m 


< 


^0 


(Vm = ni,n2, • • • )> 


(151) 


where G{-) is defined as in (fT3]) and z/q > 0 is a positive eonstant. Notiee here that Vg^Pg{Xm) Vd^Pe 
as m —)■ CX3 owing to (11461) . For 6 e < 8)2 with Vg^p^, > 0, we have Vg^Pg{xm) > 0 for all m>mi with 
some mi > 0. Sinee log(Xm + 1) < \X\ log(m + 1) and G{-) is eontinuous, by letting m ^ 00 we 


obtain 


lim inf 

m—^oo 


> G 


S-3j 


= '^e,Pg{S — 37 ), 


(152) 


where we have used the relation in ([13]) for the equality. For 6* G ©2 with Vg^Pg = 0, G{zl^Vg^Pg) 
is the step funetion whieh takes zero for 2; < 0 and one otherwise. Then, we have (11521) for sueh 
6* G ©2, too. Putting (11331) . (11391) . (11501) . and (11521) together, we obtain 

lim inf > lim inf / dw{ 6 )ag^m 

m^oo m —>00 ^02 ' 


> lim inf 

m^oo 


'02 


dw{9)fg^r. 


> / dw{9) lim inf fg r, 

702 


> 


^e,Pg{S - 37) dw{9) 


'02 


> inf 


Pen- 


^g,p{S - 37) dw{9), 


(153) 


s,r Je2 


where we have used Fatou’s lemma in the third inequality and the relation Pg G Ile^r = Pgp in the 
last inequality. 

To finalize the evaluation of Am for both the two oases, oombining (11451) and (11531) leads to 


lim inf 71(2) > 


Pen 


s,r J{g I ce_r=P} 


= inf / 

^6% p 7|g I J(^p^Wg)=R} 


'^e,p{S — 37) dw{9) 


^e,p{S — 37) dw{9) 


(154) 
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because Q 2 = {0\ ce^r = R} = {^\ ^d) = R} R ^ r with R = p. 

Now, we are in a position to synthesize all evaluations. By the definition of achievability, it follows 
from (1131L which means Em > Am + Am — , (11411) . and (11541) that 


e > lim sup 

n^oo 


> lim sup Em 

m—^oo 

>limsup(yl!i>+yl®) 

m—)-cx) 

> lim inf + lim inf 

m—YCX5 m—Yoo 

> f dw{9) + inf f ^g^p(S — 87 ) dw{9) 

J{e\cg^r<R} J{e\I{P,We)=R} 


= inf 


Pen- 


s.r I 4 { 6 »|ca_r<-R} 


dw{9) + 


l{e\I{P,We)=R} 


'^e,p{S - 87 ) dw{9) 


(155) 


We note that ©i = {0 | < R} = {(^\ HRy ^0) < -R} for any P e Ilgp with P = Cgp due to the 

definition of F-well-orderedness, so it follows from (11551) that 


S' — 87 < sup sup < S 


l{9\I{P,Wg)<R} 


dw{9) 


> {e\I{P,Wg)=R} 


^e,p{S) dw{9) < £ r • 


Since 7 > 0 is arbitrary, we completed the proof of the converse part. 


(156) 

□ 


VI. Concluding Remarks 

In this paper, we have established the coding theorem for the (c|r) -capacity of mixed memory less 
channels with general mixture. For mixed memoryless channels with general mixture, a direct part of 
the second-order coding theorem has also been provided. The class of F-well-ordered mixed memoryless 
channels, whose component channels are ordered according to their capacity with cost constraint F, has 
been introduced to further analyze the second-order (e, P|F)-capacity. The F-well-orderedness allows 
us to establish a second-order converse theorem, which coincides with the direct theorem for mixed 
memoryless channels with general mixture. The obtained results include several known results as special 
cases such as capacity characterizations for mixed memory less channels with general mixture [[T]|, |l5l 
and for regular decomposable channels with stationary memory less components IfT^ . an e-capacity 
characterization for mixed memoryless channels with countable mixture Il20l . and second-order (e,P)- 
capacity characterizations for additive-noise channels with finite mixture ifT^ and for well-ordered 
memory less channels with finite mixture ll2TI . 

Tomamichel and Tan llT^ have recently discussed mixed memoryless channels with finite 0 by treating 
them as memoryless channels with finite states. In other words, channel state 6 ^ G 0 is selected with 
probability w{9) before the transmission of a codeword of length n. In the scenario where the encoder 
and decoder can observe channel state 9, characterizations for the (£|F)-capacity and (e, P|F)-capacity 
have been discussed. Indeed, when 0 is finite and the encoder and decoder can access to the channel 
state information, the (£|F)-capacity and (e, P|F)-capacity are characterized as the natural counterparts of 
those in (11071) and (11081) . respectively, even for mixed memoryless channels whose component channels 
are not necessarily F-well-ordered. We can easily extend this result to mixed memoryless channels with 
general mixture (general states). 

As noted in Sect. IV-D[ Lemma |7] holds for mixed channels with general input and output alphabets {X 
and y), and we can also establish the converse part of the first-order coding theorem which corresponds 
to Theorem [T] in the case with finite X and general y. However, the proof of a direct part in this case 
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may be trickier because we rely on the upper-decomposition technique of Lemma |4] (that is, the method 
of types). Extensions of the established formulas for mixed channels with general input and/or output 
alphabets are interesting and practically important research subjects. 

Appendix A 
Proof of Lemma [3] 

Given an arbitrary i.i.d. product probability distribution Qq on 3/", let be given as in (l30l) . Since 
Q^{y) is the expectation of Qgiy) with respect to w{9), Markov’s inequality implies that 

Pr{0 e 0(y)} > 1-e-^ i^yey^). (157) 

We also have 


Pr{0e0=} = Pr{0eUfc0(^fc)'=} 

< ^ Pr e QiSkY} <{n + l)l^le- 

k 

Here, A‘^ denotes the complement of a set A. Therefore, 

Pr e 0n} > 1 - (n + l)l^le-^. 

In a similar way, we also have 

Pr G 0„| > 1 - (n + 

Then, it holds for 0* = 0„ n that 

Pr{0 G 0 ;} > 1 -2(n + 

thus, yielding (1351) . 


(158) 


(159) 

(160) 


(161) 

□ 


Appendix B 
Proof of Lemma g] 

The proof is implicitly contained in Han [15]|. We summarize it here for the reader’s convenience. 
Lor a given 7 > 0, we define a set 


D 


n 


for 6* G 0. Then, it holds that 


yey 


n 


■ log Pyn(^) 


- log Pyn {y) < 

n 



Pr{VGP4 


y&Dn 


< ^ Py4y)e-v^^ 

y&Dn 

< e“^^. 


(162) 


(163) 


Hence, for any real number Zn we have 
Pr|-ilogPy„(L-)<.2„| < 

< 


Pr |-i logPy^y,") < YY Y Pn} + Pr {17 e Pn} 

Pr {-1 logPy^V) < -n + ^} + 


( 164 ) 





27 


for all 0 G 0. By using the above inequality we have 


^ , 1 , W^(Y.^\X^) 

Pr - log . . < Zn 

n Pyn [Y^) 


= Pr <; -logPP"(F,’^|X'^) - llogPy.(y") < 
n n 


< Pr <1 - log W^{Ye^\X^) - - log Py.(r^") <Zn + ^\ + 
n n ® vn 


< Pr<;ilogPP,"(F,"|X")--logPy.(y,")<^„ + ^+ ^ 

n n ® 


n xVrp 


+ e 


-Vn-y 


= PwiioglW^ 




7 


n ^ 


^-Vny 


(165) 


for 9 G 0*, where the last inequality is due to the inequality WQ{y\x) < e^W'^{y\x) for 6 G 0*. This 
eompletes the proof. □ 

Appendix C 
Proof of Lemma [5] 

This proof is also implieitly eontained in Han [|5l in the ease of Q"^ = Pyn, where Pyn denotes the 
output distribution on due to input X^ via ehannel W^. Similarly to (11641) . we obtain 


n 


Pr - log W^{Y^\X^) < ^4 > Pr - log lP"(y"|X") < 


7 


n 


,-Vn 7 


(166) 


for 9 E Q. Using this inequality, we have 


PriXgYXim 


^ Z-n 


= Pr<i-loglU’^(F,"|X")--logQ"(y,’^)<z, 

n 


n 


> Pr <; - log lU,"(y,"|X") - - log Q"(y,") < z, 
n n 


7 


^-y/ny 


> Pr -i i log lV;(y,"|X") - i logQJ(y,") < r„ - ^ ^ 


^-y/ny 


A < 3 s(y„") - 


^ Zr}. 


7 


n ^ 




(167) 


for 9 G 0*, where the last inequality is due to the inequality Q^iy) < e^Q'^iy) for 9 G 0*. Thus, we 
eomplete the proof. □ 

Appendix D 
Proof of Lemma |7] 

For any given (n, M„, Sn) eode Cn = {r^i, 142, • • •, it follows from ([H) and dS]) that 


Mn 


^ T). 


i=l 

■1 ^71 p 

ttY 

"■ i=i 70 

f f ^ Mn 


( 168 ) 
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where the equality in (11681) is obtained by exehanging the integral and the sum of finitely many terms. 
Here, the term inside the braee {■} in (11681) eorresponds to the average error probability with the deeoding 
region {Di}^^ over Wq. Then, a simple but key observation is that eaeh of sueh terms indexed by 6> e 0, 
whieh is eharaeterized by the eommon deeoding region may be lower bounded separately using 

another set depending on 0 G 0. 

Define the set 


Be,i 




n ^ Q^{y) 


<- log Mn-r] \ . 

n J 


Then the term inside the braee {■} in (11681) ean be bounded as 


(169) 


Mr 


Afn 

Z 

2=1 


Mn 


WS(Dl 


Ui 




Ui 


2=1 



Mn 


2=1 


where the equality in (11701) follows from the relation 


Mn 




2 = 1 


(170) 


n Be,i = Be,^ \ (A n Be,i). 

We foeus on the seeond term in (11701) . By definition, every y E Bo i satisfies 

Then the seeond term in (11701) is bounded as 


■1 Mn -1 Mn 


i=l 


Ui 


i=l y&DiOBg^i 

Mr, 


i=l yeDiOBg^i 


Mn 


<e-"’'E«W 


= e-"". 


2=1 


where (11721) is used to obtain (1173k and ^ is used to obtain the equality in (11741) . 

/ield^ 


Plugging (11741) into (11701) yield 

^ Mn 


Mn 


M„ 


E > — Ew';(B<-..i«.) - 




i=l "■ i=l 

Thus, the left-hand side of (11681) is lower bounded as 




^-ny 


whieh is equivalent to (11231) . 


(171) 


(172) 


(173) 


(174) 


(175) 


(176) 

□ 


"'inequality Ill75l l is Hayashi-Nagaolca’s lower bound on the probability of decoding error, which has been originally established for the 
quantum channel setting (7), for the component channel . The derivation is essentially the same but slightly more direct than the original 
derivation (cf. (6] Sect. IX-B]). 
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